NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Machine Learning Interpretation of Optical Spectroscopy Using Peak-Sensitive Logistic Regression

https://doi.org/10.1021/acsnano.4c16037

Wang, Ziyang; Ranasinghe, Jeewan C; Wu, Wenjing; Chan, Dennis_C Y; Gomm, Ashley; Tanzi, Rudolph E; Zhang, Can; Zhang, Nanyin; Allen, Genevera I; Huang, Shengxi (April 2025, ACS Nano)

Free, publicly-accessible full text available April 29, 2026
Graphical Model Inference with erroneously Measured Data

https://doi.org/10.1080/01621459.2023.2256503

Zheng, Lili; Allen, Genevera I (July 2024, Journal of the American Statistical Association)

Full Text Available
Fair Feature Importance Scores for Interpreting Tree-Based Methods and Surrogates

Little, Camille O; Lina, Debolina H; Allen, Genevera I (June 2024, Transactions on machine learning research)

Full Text Available
Interpretable Machine Learning for Discovery: Statistical Challenges and Opportunities

https://doi.org/10.1146/annurev-statistics-040120-030919

Allen, Genevera I; Gan, Luqin; Zheng, Lili (April 2024, Annual Review of Statistics and Its Application)

New technologies have led to vast troves of large and complex data sets across many scientific domains and industries. People routinely use machine learning techniques not only to process, visualize, and make predictions from these big data, but also to make data-driven discoveries. These discoveries are often made using interpretable machine learning, or machine learning models and techniques that yield human-understandable insights. In this article, we discuss and review the field of interpretable machine learning, focusing especially on the techniques, as they are often employed to generate new knowledge or make discoveries from large data sets. We outline the types of discoveries that can be made using interpretable machine learning in both supervised and unsupervised settings. Additionally, we focus on the grand challenge of how to validate these discoveries in a data-driven manner, which promotes trust in machine learning systems and reproducibility in science. We discuss validation both from a practical perspective, reviewing approaches based on data-splitting and stability, as well as from a theoretical perspective, reviewing statistical results on model selection consistency and uncertainty quantification via statistical inference. Finally, we conclude byhighlighting open challenges in using interpretable machine learning techniques to make discoveries, including gaps between theory and practice for validating data-driven discoveries.
more » « less
Full Text Available
Data Augmentation via Subgroup Mixup for Improving Fairness

https://doi.org/10.1109/ICASSP48485.2024.10446564

Navarro, Madeline; Little, Camille; Allen, Genevera I; Segarra, Santiago (April 2024, IEEE)

Full Text Available
Addressing Confounds in Functional Connectivity Analyses of Calcium Imaging

https://doi.org/10.1109/ICASSP48485.2024.10447836

Ye, Dingding; Santhirasegaran, Charan; Pai, Ryan; Allen, Genevera I; Young, Joseph (April 2024, IEEE)

Full Text Available
Subbotin graphical models for extreme value dependencies with applications to functional neuronal connectivity

https://doi.org/10.1214/22-AOAS1723

Chang, Andersen; Allen, Genevera I. (September 2023, The Annals of Applied Statistics)

Full Text Available
Supervised Convex Clustering

https://doi.org/10.1111/biom.13860

Wang, Minjie; Yao, Tianyi; Allen, Genevera I. (March 2023, Biometrics)

Abstract Clustering has long been a popular unsupervised learning approach to identify groups of similar objects and discover patterns from unlabeled data in many applications. Yet, coming up with meaningful interpretations of the estimated clusters has often been challenging precisely due to their unsupervised nature. Meanwhile, in many real-world scenarios, there are some noisy supervising auxiliary variables, for instance, subjective diagnostic opinions, that are related to the observed heterogeneity of the unlabeled data. By leveraging information from both supervising auxiliary variables and unlabeled data, we seek to uncover more scientifically interpretable group structures that may be hidden by completely unsupervised analyses. In this work, we propose and develop a new statistical pattern discovery method named supervised convex clustering (SCC) that borrows strength from both information sources and guides towards finding more interpretable patterns via a joint convex fusion penalty. We develop several extensions of SCC to integrate different types of supervising auxiliary variables, to adjust for additional covariates, and to find biclusters. We demonstrate the practical advantages of SCC through simulations and a case study on Alzheimer's disease genomics. Specifically, we discover new candidate genes as well as new subtypes of Alzheimer's disease that can potentially lead to better understanding of the underlying genetic mechanisms responsible for the observed heterogeneity of cognitive decline in older adults.
more » « less
Thresholded graphical lasso adjusts for latent variables

https://doi.org/10.1093/biomet/asac060

Wang, Minjie; Allen, Genevera I (November 2022, Biometrika)

Summary Structural learning of Gaussian graphical models in the presence of latent variables has long been a challenging problem. Chandrasekaran et al. (2012) proposed a convex program for estimating a sparse graph plus a low-rank term that adjusts for latent variables; however, this approach poses challenges from both computational and statistical perspectives. We propose an alternative, simple solution: apply a hard-thresholding operator to existing graph selection methods. Conceptually simple and computationally attractive, the approach of thresholding the graphical lasso is shown to be graph selection consistent in the presence of latent variables under a simpler minimum edge strength condition and at an improved statistical rate. The results are extended to estimators for thresholded neighbourhood selection and constrained $$\ell_{1}$$-minimization for inverse matrix estimation as well. We show that our simple thresholded graph estimators yield stronger empirical results than existing methods for the latent variable graphical model problem, and we apply them to a neuroscience case study on estimating functional neural connections.
more » « less
Full Text Available
Fast and interpretable consensus clustering via minipatch learning

Gan, Luqin; Allen, Genevera I (October 2022, PLOS Computational Biology)

Full Text Available

« Prev Next »

Search for: All records